Dimensionality Reduction for Data Visualization
نویسندگان
چکیده
Dimensionality reduction is one of the basic operations in the toolbox of data-analysts and designers of machine learning and pattern recognition systems. Given a large set of measured variables but few observations, an obvious idea is to reduce the degrees of freedom in the measurements by representing them with a smaller set of more “condensed” variables. Another reason for reducing the dimensionality is to reduce computational load in further processing. A third reason is visualization. “Looking at the data” is a central ingredient of exploratory data analysis, the first stage of data analysis where the goal is to make sense of the data before proceeding with more goal-directed modeling and analyses. It has turned out that although these different tasks seem alike their solution needs different tools. In this article we show that dimensionality reduction to data visualization can be represented as an information retrieval task, where the quality of visualization can be measured by precision and recall measures and their smoothed extensions, and that visualization can be optimized to directly maximize the quality for any desired tradeoff between precision and recall, yielding very well-performing visualization methods.
منابع مشابه
Nonlinear Dimensionality Reduction
The visual interpretation of data is an essential step to guide any further processing or decision making. Dimensionality reduction (or manifold learning) tools may be used for visualization if the resulting dimension is constrained to be 2 or 3. The field of machine learning has developed numerous nonlinear dimensionality reduction tools in the last decades. However, the diversity of methods r...
متن کاملDimensionality reduction techniques for multivariate data classification, interactive visualization, and analysis-systematic feature selection vs. extraction
The curse of dimensionality, i.e., the fact that feature spaces of increasing dimensionality with finite sample sizes tend to be empty, has given incentive to a plethora of research activities in various disciplines and diverse application fields, e.g., statistics or neural networks. Three major application fields are multivariate data classification, data analysis, and data visualization. In t...
متن کاملDimensionality reduction for financial data visualization
Various data mining methods are used for examining large financial data sets to uncover hidden and useful information. Ability to access big data sources raises new challenges related with capabilities to handle such enormous amounts of data. This research focuses on big financial data visualization that is based on dimensionality reduction methods. We use data set that contains financial ratio...
متن کاملInformation Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization
Nonlinear dimensionality reduction methods are often used to visualize high-dimensional data, although the existing methods have been designed for other related tasks such as manifold learning. It has been difficult to assess the quality of visualizations since the task has not been well-defined. We give a rigorous definition for a specific visualization task, resulting in quantifiable goodness...
متن کاملA framework for the visualization of multidimensional and multivariate data
High dimensionality is a major challenge for data visualization. Parameter optimization problems require an understanding of the behaviour of an objective function in an n-dimensional space around the optimum this is multidimensional visualization and is a natural extension of the traditional domain of scientific visualization. Large numeric data tables with observations of many attributes requ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010